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ABSTRACT 

Bayesian model selection provides a formal method of determining the level of support 
for new parameters in a model. However, if there is not a specific enough underlying 
physical motivation for the new parameters it can be hard to assign them meaningful 
priors, an essential ingredient of Bayesian model selection. Here we look at methods 
maximizing the prior so as to work out what is the maximum support the data could 
give for the new parameters. If the maximum support is not high enough then one 
can confidently conclude that the new parameters are unnecessary without needing to 
worry that some other prior may make them significant. We discuss a computationally 
efficient means of doing this which involves mapping p-values onto upper bounds of the 
Bayes factor (or odds) for the new parameters. A p-value of 0.05 (1.96er) corresponds 
to odds less than or equal to 5:2 which is below the 'weak' support at best threshold. 
A p-value of 0.0003 (3.6a) corresponds to odds of less than or equal to 150:1 which is 
the 'strong' support at best threshold. Applying this method we find that the odds on 
the scalar spectral index being different from one are 49:1 at best. We also find that 
the odds that there is primordial hemispherical asymmetry in the cosmic microwave 
background are 9:1 at best. 



1 INTRODUCTION 



When there are several competing theoretical models, 
Bayesian model selection provides a formal way of evalu- 
ating their relative probabilities in light of the data and any 
prior information available. A common scenario is where a 
model is being extended by adding new parameters. Then 
the relative probability of the model with the extra parame- 
ters can be compared with that for the original model. This 
provides a way of evaluating whether the new parameters are 
supported by the data. Often the original model is "nested" 
in the new model in that the new model reduces to the orig- 
inal model for specific values of the new parameters. The 
Bayesian framework automatically implements an Occam's 
razor effect as a penalization factor for less predictive models 
- the best model is then the one that strikes the best balance 
betw een goodness of fit and economy of parameters (jTrottal 
l2007f ). 

For nested models, the Occam's razor effect is controlled 
by the volume of parameter space enclosed by the prior 
probability distributions for the new parameters. The rel- 
ative probability of the new model can be made arbitrarily 
small by increasing the broadness of the prior. Often this 
is not problematical as prior ranges for the new parameters 
can (and should) be motivated from the underlying theory. 
For example, in estimating whether the scalar spectral in- 
dex (n) of the primordial perturbations is equal to one (see 
Sec.f?}, the prior range of the index can be constrained to be 
0.8 ;$n<5 1.2 by assuming the perturbations were generated 
by slow roll inflation. The sensitivity of the model selec- 



tion result can also be easily investigated for other plausi- 
ble, ph ysically motivated choice of prior ranges (e.g.. iTrottal 
[|2007d lbk 

However, there are cases like the asymmetry seen in the 
WMAP cosmic microwave background (CMB) temperature 
data (see Sec. [5} where there is not a specific enough model 
available to place meaningful limits on the prior ranges of the 
new parameters. This hurdle arises frequently in cases when 
the new parameters are a phenomenological description of 
a new effect, only loosely tied to the underlying physics, 
such as for example expansion coefficients of some series. In 
these cases, an alternative is to choose the prior on the new 
parameters in such a way as to maximise the probability of 
the new model, given the data. If, even under this best case 
scenario, the new model is not significantly more probable 
than the old model, then one can confidently say that the 
data does not support the addition of the new parameters, 
regardless of the prior choice for the new parameters. 



2 UPPER BOUNDS ON THE BAYES FACTOR 

A model (Mo) may be compared to a new model with extra 
parameters (Mi) using the Bayes factor (also known as the 
odds) 



B 



p(zjMi) 
p(as|M ) ' 



(1) 



where x is the data and the model likelihood 
0, 1) is given by 



[x\Mi) (i 
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p(x\Mi] 



d9 iP (x\9 i ,M i )p{9 i \M i ) 



(2) 



with 9i denoting the parameters under model . The Bayes 
factor gives the change in the relative probability of the two 
models brought about by the data x, i.e. 



p(Mi\x) = P(Mi) 
p(M |ar) P{M y 



(3) 



where P(Mi) (i = 0, 1) are the prior probabilities for the 
two models and P(Mi\x) the posterior probabilities. The 
level of support is usually categorized as either 'inconclusive' 
(| In S < 1), 'weak' (1 |lnJ5| < 2.5), 'moderate' (2.5 < 
lnB| sC 5), or 'strong' (| lnB| > 5). 

We denote the new parameters by 9 and they are fixed 
to be 6* under the simpler model (we restrict our consider- 
ations to nested models). The Bayes factor is then, using a 
genera li zed v ersion of the Savage-Dickey density ratio (see 
iTrottal (|2007l l for details) 



B = 



p(g'lMi) 
p{9*\x,MiY 



(4) 



where p(9*\x, Mi) is the posterior distribution under Mi, 
evaluated at 9 = 9* . If p(9\Mi) is made sufficiently broad, 
p(9*\x, Mi) depends only on the likelihood. Thus, B can 
be made arbitrarily small by making p(9\Mi) sufficiently 
broad (since the prior must be normalized to unity proba- 
bility content, a broader p(9\Mi) corresponds to a smaller 
value of p(9*\Mi)). This is not problematical if the physi- 
cal model underlying Mi is specific enough to provide some 
well-motivated prior bounds on 9. When this is not the case, 
an upper bound can still be obtained on B by optimizing 
over all priors and choosing p{9\Mi) to be a delta function 
centered at the maximum likelihood value, # max . This is the 
choice that maximally favours Mi , and the upper bound on 
the odds is then 



B 



p(x\9 m ^,Mi) 
p(x\9*,Mo) 



(5) 



corresponding to the likelihood ratio between # max and 9*. 
However, such a choice of prior fails to capture that Mi is 
supposed to be a more complex model than Mo. Since 9* 
represents the theoretically motivated simpler hypothesis, 
it makes sense that the alternative hypothesis has a more 
spread out prior distribution for 9. Furthermore, if there is 
a priori no strong preference for either 9 > 9* or 9 < 9* , then 
it may be preferable to maximize over priors that are sym- 
metric about 9* and unimodal (the latter r equirement com- 
ing ag ain from a principle of indifference) . iBerger fc Sellkd 
l|l987f ) show that maximizing B over all such p(9\Mi) is the 
same as maximizing over all p(9\Mi) that are uniform and 
symmetric about 9*. We give an explicit example of this 
procedure in Eq. 

However, this optimization may be computationally 
prohibitive as evaluating Eq. @ usually r equires numerical 
evaluation of high dimensio nal integrals l)Mukheriee et all 
2006 ; iFeroz fc HobsonlteOOTft . An alternative way of obtain- 
ing an upper bound on B, that does not rely on explic- 
itly specifying a class of alte rnative priors for 9, is to use 
Bayesian calibrated p-values dSellke et alj|200lh . First, fre- 
quentist methods are used to obtain the p-value. To do this, 
a test statistic (t) needs to be chosen, with the general prop- 
erty that the larger the value the less well the data agree 



p— value 


B 


InB 


sigma 


category 


0.05 


2.5 


0.9 


2.0 




0.04 


2.9 


1.0 


2.1 


'weak' at best 


0.01 


8.0 


2.1 


2.6 




0.006 


12 


2.5 


2.7 


'moderate' at best 


0.003 


21 


3.0 


3.0 




0.001 


53 


1.0 


3.3 




0.0003 


150 


5.0 


3.6 


'strong' at best 


6 x 10~ 7 


43000 


11 


5.0 





Table 1. Translation table (using Eq. JHJl) between p-values and 
the upper bounds on the odds (B) between the two models. The 
'sigma' column is the corresponding number of standard devia- 
tions away from the mean for a normal distribution. In the 'cat- 
egory' column are the descriptions for the different categories of 
support reachable for the corresponding p-value. 



with Mq. A common choice is the improvement in the max- 
imum likelihood value when the additional parameters are 
allowed to vary. However, if the likelihood is computation- 
ally expensive to obtain, then other measures may be used. 
The p-value is given by 



p = p(t> t obB (x)\M ), 



(6) 



where t i, B (x) is the value of t estimated from the data. The 
key property of p-values is that if Mq is correct, and t is a 
continuos statistic, then the probability distribution of p will 
be uniform, p(p\Mo) = 1 for ^ p 1. The final result will 
not be sensitive to the precise choice of t. The only property 
needed for t is that it should be a continuous statistic and 
larger values of t should correspond to less agreement with 
Mo. It follows t hat p(p\Mi) will be monotonically decreasing 
for < p < 1. ISellke et alj (|200lh express the Bayes factor 
in terms of the distribution of the p-values 



B = 



p{p\Mi) 



= p(p\Mi). 



(7) 



p{p\M ) 

They look at a wide range of non-parameteric monotoni- 
cally decreasing distributions for p[p\Mi) and under mild 
regularity conditions, they find the upper bound 

-1 



B C B — 



eplnp 



(8) 



for p ^ e _1 , where e is the exponential of one. Table [T] 
lists B for some common thresholds of p and ln_B. Note 
how the p-value of 0.05 (a 95% confidence level result) only 
corresponds to an odds ratio upper bound of B — 2.5 and 
so does not quite reach the "weak" support threshold even 
for an optimized prior. Also note that in order order for 
the "strong" support threshold to be reachable, a ^ 3.6 is 
required. 

In general, for large sample size and under mild regular- 
ity conditions, the p-value for the addition of one or more 
new parameters can be estimated by finding the maximum 
likelihood with the new parameters fixed (C al ), and when 
the new parameters are allowed to vary (£ max ). Then the 
quantity 



Axeff = -21n(£^ ax /£ n 







(9) 



has a Chi squared distribution with the number of degrees 
of fre edom equal to the number of new parameters l|Wilks! 
Il93&t ). It is important to note that for this to be valid none 
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of the new parameters can have their fixe d values on the 
bound ary of the parameter space (see e.g. IProtassov et al.l 
(2002) for an astronomy-oriented example where this con- 
dition does not hold). The p-value can then be estimated 
by 



p— value sigma fraction of true nulls lower bound 



P = 



Xu{y)dy = 1 - 



I>/2,A Xc 2 ff /2) 
I>/2) 



(10) 



where xt is the Chi squared distribution with v degrees of 
freedom, and v is the number of new parameters. Eq. (|10p 
is simply the asymptotic probability of obtaining a A^ 2 as 
large or larger then what has actually been observed, Ax^h ■, 
assuming the null hypothesis is true. If the above procedure 
cannot be applied (for instance because the new parameters 
lie at a boundary of the parameter space), then the p-value 
can still be obtained by Monte Carlo simulations. 

A very different approach to estimating the Bayes factor 
without having to s pecify a prior is the Bayesian Informa- 
tion Criteria (BIC) (jSchwarzll 19781 ; iMagueiio fc Sorkinll2007l; 
lLiddlell2007l ). The BIC assumes a prior for th e new parame - 
ters which is equivalent to a single data point (iRaftervl 19951 ). 
Therefore, it will in general give lower values for B. The BIC 
is complementary to the upper bound for B presented here 
in that it provides a default weak rather than default strong 
prior. 



3 AN ILLUSTRATIVE EXAMPLE 

Consider the case where under Mo, x ~ A/"(/io, a) for fixed (j,o 
(the null hypothesis), while under the alternative Mi, x ~ 
N{n, a) and N data samples are available (with a known) . 
If the prior on [i is taken to be symmet ric about \i = fio and 
unimodal, then dBerger fc Sellkelll987r i 



B 



<j){K + t)+ <t>{K - t) 
20(t) 



(11) 



where t = y/N\x — fio\/c, <t>{y) = e y / ' 2 , and K is found by 
solving 



(12) 



K[</>{K + t) + <KK-t))= / <P(y)dy. 

J-(K+t) 

Alternatively, the p-value is given by 



<f>{y)^y ■ 



(13) 



K=- t 0bB 



This can be converted to a upper bound on the Bayes factor 
using Eq. J5). The results for the two methods are virtually 
identical and can be read off Table [T] where t is the number 
of sigma. 



ISellke et al.l (|200ll ) present an interesting simulation 
study of this model. Consider the case described above, and 
let us generate data from a random sequence of null hypoth- 
esis (Mo) and alternatives (Mi), with fio = 0, a = 1 and 
l_i ~ A/"(Q, 1). Suppose that the proportion of nulls and alter- 
natives is equal. We then compute the p-value using Eq. (|13[) 
and we select all the tests that give p £ [a — e, a + e], for 
a certain value of a and e«a. Among such results, which 
rejected the null hypothesis at the 1 — a level, we then deter- 
mine the proportion that actually came from the null, i.e. 



0.05 
0.01 
0.001 



1.96 
2.58 
3.29 



0.51 
0.20 
0.024 



0.29 
0.11 
0.018 



Table 2. Proportion of wrongly rejected nulls among all results 
reporting a certain p-value (simulation results). This illustrates 
that the p-value is not equal to the fraction of wrongly rejected 
true nulls, which can be considerably worse. This effect does not 
depend on the assumption of Gaussianity nor on the sample size. 
The right most column gives a lower bound on the fraction of 
true nulls derived using Eqs. JH} and ]14t . 



the percentage of wrongly rejected nulls. We assume that 
either Mi or Mq is true. This allows us to use 



P(M \x) = 



l+B 



(14) 



The results are shown in Table We notice that among 
all the "significant" effects at the 2a level about 50% are 
wrong, and in general when there is only a single alternative 
at least 29% of the 2a level results will be wrong. 

The root of this striking disagreement with a common 
misinterpretation of the p-value (namely, that the p-value 
gives the fraction of wrongly rejected nulls in the long run) is 
twofold. While the p-value gives the probability of obtain- 
ing data that are as extreme or more extreme than what 
has actually been observed assuming the null hypothesis is 
true, one is not allowed to interpret this as the probabil- 
ity of the null hypothesis to be true, which is actually the 
quantity one is interested in assessing. The latter step re- 
quires using Bayes theorem and is therefore not defined for 
a frequentist. Also, quantifying how rare the observed data 
are under the null is not meaningful unless we can compare 
this number with their rareness under an alternative hy- 
pothesis. Both these p oi nts are discussed in greater detail in 
iBerger fc Sellkd (|l987l ); ISellke et all l|200ll ); lBerge3 (|2003D . 



4 SCALAR SPECTRAL INDEX 

Here we evaluate the upper bounds on the Bayes factor for 
the scalar spectral index (n) using WMAP combined with 
other data, comparing a Harrison-Zeldovich model (n = 1) 
to a model where n can assume other values. For this prob- 
lem, there are well motivated priors. If the primordial per- 
turbations are from slow roll inflation, then 



n = 1 + 2r) - 



(15) 



where n and e are the slow roll parameters and need to 
be much less than one. For most models e <C r\ and so a 
reasonable prior bound is 



0.8<n<1.2. 



(16) 



which can be implemented by taking a Gaussian prior of the 
form 

p(n s \Mi) = Af(fi = 1.0, a — 0.2). (17) 
However, if the inflation potential (V) is of the form 



V = V - -m 2 



(18) 
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data 



AXcft p-value In B In B 



B 



WMAP 

(Spereel et al. 2007) 



0.014 



1.8 



WMAPext+SDSS 
+2df+No SZ 
(Parkinson et al. 2006) 



0.005 2.0 2.7 15 



WMAPext+HST 
(Kunz et al. 2006) 



0.004 2.7 2.8 16 



WMAPext+HST+SDSS 
(Trotta 2007) 



11 0.001 2.9 3.9 49 



Table 3. The odds against a Harrison-Zeldovich spectrum. The 
p-values where estimated from ^X^s usm g Eq. JToJl. The upper 
bounds on the Bayes factor were estimated using Eq. J5). Where 
InB is available it was calculated with the prior of Eq. H17I I. 



(where 



is the inflaton, and Vb and m a re con s tants) 
then inflation can occur with r\ ~ 1 l|Lindel 12001 



iBoubekeur fc Lvthll2005l ) and so a larger range of n may 
be considered for the prior. 

As there is such a broad range for the the prior on n, 
it is useful to evaluate what is the upper bound on the odds 
for a non-Harrison-Zeldovich spectrum, n / 1. In Table [3] 
we list a number of different studies of the variation of the 
spectral index for a range of data. Where the Bayes factor 
has been worked out it can be seen that our estimate of 
the upper bound is always more than the evaluated version. 
Also, for the case with the greatest amount of data there is 
quite a large discrepancy between the upper bound and the 
evaluated odds. This makes sense as the same prior for n 
was used (Eq. (|17[) ) but now the data is more constraining 
and so the maximizing prior is narrower. Using the most 
constraining data combination (WMAPext+HST+SDSS) 
the upper limits on the odds against n = 1 is 49:1. 
However, the odds against Harrison-Zeldovich could be 
weakened by various systematic effects in data analysis 
choices, e.g. inclusion of gravitational lensing, beam mod- 
elling, not including Sunyaev-Zeldo vich (SZ) marginaliza- 



tion, and po i nt-source subtrac t ion (IPeiris fc EastherJ 
Lewis! 120061; iParkinson et al.l 120061; lEriksen et all 



Huffcnbcrger et al.|[2006; Spergel et ai]|2007l) 



2006 



2007 



data 



AXcff p-value InB InB B 



WMAP (7°) 
(Spereel et al. 2007) 



0.1 



WMAP (7°)+C m arp, 
(Gordon 2007) 



0.03 



1.3 



WMAP (3.6°)+C mar g 
(Eriksen et al. 2007) 



11 



0.01 1.8 2.16 9 



Table 4. The odds for dipolar modulation, A =i 0. The res- 
olution of the data used is also indicated. The C ma rg refers to 
marginalisation over a non-modulated monopole and dipole. In B 
was evaluated using Eq. JH}. 



sharply with scale. However, the required modulation should 
probably extend a ll the way to scales associ ated with the 
harmo nic I = 40 (jHansen et al.ll2004l ). Also, llnoue fc Silkl 
(2006) postulated that Poisson distributed voids may be re- 
sponsible for the asymmetry. But, a generating mechanism 
for the voids and a detailed likelihood analysis are presently 
lacking. 

Therefore at present there is not a concrete enough the- 
ory to place meaningful prior limits on A. However, we 
can still work out the upper limit on the Bayes factor. 
The p-values can be evaluated from Eq. (|10[) . Although 
A = is on the boundary of the parameter space, the prob- 
lem can be reparameterized in Cartesian coordinates where 
A = w'i + Wy + w\ and Wi is a linear modulation weight for 
spatial dimension i. Then the Wi = point, for all i, will not 
be on the edge of the parameter space and so Eq. (|10|) can 
be used. 

The results are shown in Tabl e 4. Simulations had been 
done for the last row's p-value IjEriksen et al.l 120071 ) and 
were in excell e nt ag reement with the result from Eq. (|10[) . 
lEriksen et al.l l|2007h did compute the Bayes factor, taking 
as the prior ^ A 0.3 but did not give a justification for 
that prior except that it contained all the non-negligible like- 
lihood. This is unproblematic for parameter estimation, but 
is ambiguous for working out the Bayes factor. For exam- 
ple if the prior range for A was extended to be ^ A ^ 0.6 
then the Bayes factor would decrease by 2 but the parameter 
estimates would be unaffected. 



5 ASYMMETRY IN THE CMB 

In the recent WMAP 3-yr release the isotropy of the CMB 
fluctuations was tes ted using a dipolar modulating function 
i|Spergel et al.ll2007l ) 

AT(fi) = AT iso (n)(l + 4n-d) (19) 

where AT is the CMB temperature fluctuations in direction 
n, ATiso are the underlying isotropically distributed temper- 
ature fluctuations, A is the amplitude of the isotropy break- 
ing, and d is the direction of isotropy breaking. The isotropy 
of the fluctuations can then be tested by evaluating whether 
A — 0. The problem with using the Bayes ratio in this case 
is that there is no good underlying model which produces 
this type of iso t ropy b reaking. An attempt was made by 
iDonoghue et~ai1 (|2007f> to allow an initial gradient in the 
inflaton field but they found that the modulation dropped 



6 CONCLUSIONS 

Bayesian model selection provides a powerful way of evalu- 
ating whether new parameters are needed in a model. There 
are however cases where the prior for the new parameter 
can be uncertain, or physically difficult to motivate. Here 
we have looked at priors which maximize the Bayes factor 
for the new parameters. This puts the reduced model under 
the most strain possible and so tells the user what the best 
case scenario is for the new parameters. We have pointed 
out a common misinterpretation of the meaning of p-values, 
which often results in an overestimation of the true signifi- 
cance of rejection tests for null hypotheses. 

Using Bayesian calibrated p-values we have evaluated 
upper bounds on the Bayes factor for the spectral index. 
We have found that the best the current data can do is 
provide moderate support (odds ^ 49 : 1) for n / 1. We 
also looked at the maximum Bayes factor for a modulation 
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in the WMAP CMB temperature data. We found that the 
current data can at best provide weak support (odds ^9:1) 
for a departure from isotropy. 

The comparison between p-values and Bayes factors 
suggests a threshold of p = 3 x 1CP 4 or a — 3.6 is needed if 
the odds of 150:1 ("strong" support at best) are to be ob- 
tained. It is difficult to detect systematics which are smaller 
than the statistical noise and so systematic effects in the 
data analysis typically lead to a shift of order a sigma. It 
follows that the "particle physics discovery threshold" of 5a 
may be required in order to obtain odds of at best 150:1. 
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